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Abstract 

Consider a decision maker who is responsible to dynamically collect observations so as to enhance his 
Cn ' information in a speedy manner about an underlying phenomena of interest while accounting for the penalty 

(^ ' of wrong declaration. The special cases of the problem are shown to be that of noisy dynamic search and 

variable-length coding with feedback. Due to the sequential nature of the problem, the decision maker relies on 



his current information state to adaptively select the most "informative" sensing action among the available ones. 
In this paper, using results in dynamic programming, lower bounds for the optimal total cost is established. 
The lower bounds characterize the fundamental limits on the maximum achievable information acquisition rate 
and the optimal reliability. Moreover, upper bounds are obtained via an analysis of two heuristic policies for 
dynamic selection of actions. It is shown that the first proposed heuristic achieves asymptotic optimality; where 
Q ■ the notion of asymptotic optimality, due to Chernoff, implies that the relative difference between the total cost 

achieved by the proposed policy and the optimal total cost approaches zero as the penalty of wrong declaration 

cn : 

^ , (hence the number of collected samples) increases. Furthermore, using the obtained bounds, the gain of adaptive 

\o ■ 

*vj ■ selection of sensing actions is shown to be at least logarithmic in the penalty associated with wrong declarations. 

^' 

^.f-N ' variable-length coding with feedback. However, by considering the asymptotic where the number of hypotheses is 



also growing and under a mild technical condition, this second heuristic is shown to achieve non-zero information 
acquisition rate, establishing a lower bound for the maximum achievable rate. In case of variable-length coding 
with feedback, this non-zero information rate is shown to be maximum (i.e., any information acquisition at 
higher rate results in non-zero probability of error), and the proposed heuristic is proved to achieve Bumashev's 
optimal error exponent. This result extends the notions of capacity and optimal error exponent to the context of 
active sequential hypothesis testing. 

Index Terms 
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optimal error exponent, information acquisition rate. 
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I. Introduction 

This paper considers a generalization of the classical sequential hypothesis testing problem due to 
Wald [HI. Suppose there are M hypotheses among which only one is true. A Bayesian decision maker is 
responsible to enhance his information about the correct hypothesis in a speedy and sequential manner 
while accounting for the penalty of wrong declaration. In contrast to the classical sequential M-ary 
hypothesis testing problem BH-flU, our decision maker can choose one of K available actions and hence, 
exert some control over the collected samples' "information content." We refer to this generalization, 
originally tackled by Chemoff [|3, as the active sequential hypothesis testing problem. 

The active sequential hypothesis testing problem naturally arises in a broad spectrum of applications 
such as medical diagnosis |l6l, cognition [7 J, sensor selections lO, underwater inspection [9], generalized 
search [[TOl . channel coding with perfect feedback [|TT|. It is intuitive that at any time instant, an optimized 
Bayesian decision maker relies on his/her current belief to adaptively select the most "informative" 
sensing action, i.e., an action that provides the highest amount of "information." Making this intuition 
precise is the topic of our study. 

The most well known instance of our problem is the case of binary hypothesis testing with passive 
sensing (Af = 2, K = 1), first studied by Wald [jT|. In this instance of the problem, the optimal 
action at any given time is provided by a sequential probability ratio test (SPRT). There are numerous 
studies on the generalizations to M > 2 (A' = 1) and the performance of known simple and practical 
heuristic tests such as MSPRT [l2l-[|4l. The generalization to the active testing case was considered by 
Chemoff in [|5l where a heuristic randomized policy was proposed and whose asymptotic performance 
was analyzed. More specifically, under a certain technical assumption on uniformly distinguishable 
hypotheses, the proposed heuristic policy is shown to achieve asymptotic optimality where the notion 
of asymptotic optimality [5] denotes the relative tightness of the performance upper bound associated 
with the proposed policy and the lower bound associated with the optimal policy. 

The problem of active hypothesis testing also generalizes another classic problem in the literature: 
the comparison of experiments first introduced by Blackwell [[T2|. This is a single-shot version of the 
active hypothesis testing problem in which the decision maker can choose one of several (usually two) 
actions/experiments to collect a single observation sample before making the final decision. There have 
been extensive studies [fT2 l |- lfT8ll on comparing the actions. Applying various results from [[T2l. [jTSl in 
our context of active hypothesis testing and utilizing a dynamic programming interpretation, an optimal 
notion of information utility, i.e., an optimal measure to quantify the information gained by different 
sensing actions, can be derived [|T9l . Inspired by this view of the problem, which coincides with that 



promoted by DeGroot [1201 . we provide a set of (uniform) lower bounds for optimal information utility. 
Furthermore, we provide two heuristic policies whose performance is investigated via an asymptotic 
analysis. The first policy is similar to the one proposed in [[191 . [[2T|. and is shown to be asymptotically 
optimal, matching the performance of the scheme proposed in [[51 (and follow up works [|22l . [|23l ) while 
relaxing the technical assumption on uniform distinguishibiliy of the hypotheses. In contrast, our second 
proposed policy is only shown to be asymptotically optimal in the limited setup of binary hypothesis 
testing, noisy dynamic search, and variable-length coding with feedback. However, this policy has a 
provable advantage for large M over our first policy, as well as other solutions in the literature. More 
specifically, this policy can provide, under a mild technical condition, reliability and speedy declaration 
simultaneously. In information theoretic term, this policy can be shown to achieve non-zero information 
acquisition rate and hence, to generalize Bumashev's [[TT| variable-length channel coding scheme. We 
elaborate on a complete literature survey in Section [II-B[ 

The remainder of this paper is organized as follows. In Section [III we formulate the active sequential 
hypothesis testing problem and discuss the related works. Section [Till provides a dynamic programming 
formulation and characterizes an optimal notion of information utility. In Section |lVl we provide three 
lower bounds and two upper bounds on the optimal information utility. The bounds are complementary 
for various values of the parameters of the problem. Section |V] states the consequence of the bounds 
obtained in Section |lVl In particular, the obtained bounds are used to 1) establish notions of order 
and asymptotic optimality for the proposed policies (in Subsection [V-A[) : 2) characterize lower and 
upper bounds on the maximum achievable information acquisition rate and the optimal reliability (in 
Subsection [V-B[) : and 3) derive the advantage of causally and adaptively selecting sensing actions 
over the best open-loop (randomized) selection rule (in Subsection [V-C[) . In Section |VIl we investigate 
important special cases of the active hypothesis testing including the problems of noisy dynamic search 
and variable-length coding with feedback. Finally, we conclude the paper and discuss future work in 
Section [Vni 

Notation : Let [x\^ = max{x,0}. A random variable is denoted by an upper case letter (e.g. X) 
and its realization is denoted by a lower case letter (e.g. x). For any set S, \S\ denotes the cardinality 
of S. For a set A, let A(^) denote the collection of all probability distributions on elements of A, 
i.e., A(^) = {A G [0, l]!-^! : J^aeA-^^^ ~ ^J"- ^^^ logarithms are in base 2. The entropy function on a 
vector p = [pi,p2, . . ■ , Pm] G [0, 1]^^ is defined as H{p) = J^fii Pi log(VP«)' with the convention that 
Ologi = 0. Finally, the KuUback-Leibler (KL) divergence between two probability density functions 

H n'(.\ or. cr,Q^^ 7 Ic H^fin^rl qc r)(n\\n'\ — C nfr^\]r^a- , 

and 61og I = cx) for a,b e [0, 1] with b ^ 0. 



q{-) and g'(-) on space Z is defined as D{q\\q') = J^Qi^) ^og^rf^dz, with the convention Olog^ = 



II. Problem Setup and Summary of the Results 

In Subsection III-Al we formulate the problem of active sequential hypothesis testing, referred to 
as Problem (P) hereafter. Subsection III-BI states the main contributions of the paper and provides a 
summary of related works. 

A. Problem Formulation 

Here, we provide a precise formulation of our problem. 
Problem (P) [Active Sequential Hypothesis Testing] 

Let Vt = {1,2,..., M}. Let Hi, i E Cl, denote M hypotheses of interest among which only one 
holds true. Let 9 be the random variable that takes the value 9 = i on the event that Hi is true 
for i G fi. We consider a Bayesian scenario with prior belief p(0) = [pi(0), p2(0), . . . , Pa/(0)], i.e., 
initially P{{9 = i}) = Pi{0) for all i E Cl. A is the set of all sensing actions and is assumed to be 
finite with |^| = K < oo. Z is the observation space. For all a E A, the observation kernel gf (■) 
(on Z) is the probability density function for observation Z when action a has been taken and 
Hi is true. We assume that observation kernels {qi{-)} ten, a£ a ^re known and the observations are 
conditionally independent over time. Let L denote the penalty (loss) for a wrong declaration, i.e., the 
penalty of selecting Hj, j ^ i, when Hi is truelll Let r be the stopping time at which the decision 
maker retires. The objective is to find a sequence of sensing actions A{0), A{1), . . . , A{t — 1), a 
stopping time r, and a declaration rule d : A'^ x Z^ —^ Vl that collectively minimize the total cost 
E [r + Ll{rf(^r^r)-^5i}], where the expectation is taken with respect to the initial belief as well as 
the distribution of observation sequence. The objective of Problem (P) can be written as to 

minimize E [r] + LPe, (1) 

where Pe = P({d(A'^, Z^) ^ 9}) denotes the probability of making a wrong declaration. 

B. Overview of the Results and Summary of the Related Works 

The first attempt to solve Problem (P) goes back to Chemoff's work on active binary composite 
hypothesis testing [[5]|. Chemoff proposed the following scheme to select actions: At each time t, find 
the most likely true hypothesis and then select an action that can discriminate this hypothesis the 
best from each and every element in the set of alternative hypothesis. Chemoff showed that as L 
goes to infinity, the relative difference between the total cost achieved by his proposed scheme and 

'in general, we can define a loss matrix [Lij\i^j^n, where Lij denotes the penalty (loss) of selecting Hj when Hi is true. 
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the optimal total cost approaches zero; which he termed as asymptotic optimalityu One of the main 
drawbacks of Chemoff's asymptotic optimality notion was his neglecting the complementary role of 
asymptotic analysis in M. In particular, the notion of asymptotic optimality in L falls short in showing the 
tension between using (asymptotically large number of) samples to discriminate among a few hypotheses 
with (asymptotically) high accuracy or a (asymptotically) large number of hypotheses with a lower 
degree of accuracy. Although the scheme proposed in [5] and its subsequent extensions [[22 | - [|29l are 
asymptotically optimal in L, their provable information acquisition rate is restricted to zero (potentially 
unbounded number of samples are used to acquire logM bits of information). Intuitively, the rate of 
information acquisition under any given heuristic relates to the ratio between log M and the expected 
number of samples: the larger this ratio the faster information is acquired. 

As elaborated in Section IV-B[ to obtain asymptotic characterization of the expected optimal cost in 
a non-zero information rate regime, it is important to propose schemes which scale optimally with M 
as well. In his seminal paper [[TTI . Bumashev tackled the primal (constrained) version of Problem (P) 
in the context of channel coding with feedback (in Section IVI-CI we explain why channel coding with 
feedback can be interpreted as a special case of Problem (P)) and provided lower and upper bounds on 
the expected number of samples (or equivalently channel uses) required to convey one of M uniformly 
distributed messages with a desired probability of error. The lower bound identified the dominating 
terms in both number of messages and error probability, hence characterized the optimal reliability 
function (also known as the error exponent) in addition to the feedback capacity (which was known to 
coincide with the Shannon capacity C30l|). In this paper, we generalize this lower bound to the problem 
of active sequential hypothesis testing, i.e.. Problem (P): 

• For all achievable rates, we derive three lower bounds on the expected total cost ([I]). The bounds 
hold for all prior beliefs and are complementary for various values of L and M. These bounds 
are collectively used to generalize the (information theoretic) notions of achievable communication 
rate [|3TI and error exponent [|32l to the context of active sequential hypothesis testing. 

• The first and third lower bounds identify the dominating terms in L and hence are useful in 
establishing asymptotic optimality of order- 1 (due to Chernoff [[5l) and order-2 in L. Furthermore, 
from an information theoretic viewpoint, these bounds are used to characterize an upper bound on 
reliability function (error exponent) at zero information rate. 

^In O, the objective was to minimize cE[r] + Pe and the proposed policy was shown to be asymptotically optimal as c — >■ 0. It 
is straightforward to show that for L = -, this problem coincides with Problem (P) defined in this paper. However, we have chosen 
E[r] + iPe as an objective function for Problem (P) because of its Lagrangian relaxation interpretation of an information acquisition 
problem in which the objective is to minimize E[r] subject to Pe < e where e > denotes the desired probability of error. 



• We use the second lower bound as a converse (in a fashion somewhat similar to the Shannon's 
channel coding converse [|3TI ) to derive an upper bound Imax on the maximum achievable infor- 
mation acquisition rate. Additionally, this lower bound allows us to 1) provide an upper bound on 
the reliability function (error exponent) for all rates R E [0, Imax], and 2) establish order optimality 
in M as a necessary condition for any policy which achieves non-zero information acquisition rate. 

In addition to a lower bound on an expected number of samples, Bumashev proposed a coding scheme 
with two phases of operation whose performance provides a tight upper bound (in both L and M). It is 
interesting to note that the scheme of Chemoff, if specialized to channel coding with feedback, coincides 
with the second phase of Bumashev's scheme. However, while the first phase of Bumashev's scheme 
can achieve any information rate up to the capacity of the channel, Chemoff s one-phase scheme has a 
rate of information acquisition equal to zero. Inspired by Bumashev's coding scheme, we also obtain 
two heuristic two-phase policies vfi and 712'. 

• Policy TTi, in its first phase, selects actions in a way that all pairs of hypotheses can be distinguished 
from each other; while its second phase coincides with Chemoff' s scheme [5| where only the pairs 
including the most likely hypothesis are considered. The second phase of policy tti is shown to 
ensure its asymptotic optimality in L; while its first phase in a very natural manner relaxes the 
technical assumption in [5] where all actions are assumed to discriminate between all hypotheses 
pairs or the need for the infinitely often reliance on randomized action deployed in [|23l in order 
to ensure sufficient discrimination among hypotheses. 

• Policy 712, in contrast, is only shown to be asymptotically optimal in three important special cases 
discussed in Section |VIl However, under a mild technical condition, policy 7f2 can ensure that 
information acquisition occurs at a non-zero rate. Mathematically, this means that, under policy 
TT2, the expected total cost ^ grows in L and M in an order optimal fashion establishing a lower 
bound on the maximum achievable information acquisition rate I_2 < Imax as well as a lower bound 
on the optimal reliability function (optimal error exponent) for all rates R E [0,I_2]. As a corollary 
to the asymptotic optimality result of Section |Vll we recover Bumashev's optimal reliability in 
case of variable-length coding with feedback. 

The above results are also used to answer a fundamental question regarding the significance of adaptive 
decision making. In particular, specializing the obtained lower bounds to the open-loop setup along with 
our policy tti, we investigate the benefit of adapting sensing actions at any decision epoch. We show 
that almost in all practical settings the adaptivity gain grows logarithmically with penalty L. Such a 
characterization complements the more recent work on multi-stage policies (introduced in [|33l . [|34l ) 



in which the decision maker can take a retire/declare action only at the end of each stage (of poten- 
tially unequal length). Extending our characterization of the adaptivity gain to quantifying the loss in 
performance introduced by the multi-stage decision making constraint remains an area of future work. 

III. Dynamic Programming and Characterization of an Optimal Policy 

In this section, we first derive the corresponding dynamic programming (DP) equation for Problem (P). 
From the DP solution, we characterize an optimal policy for Problem (P). 

The problem of active M-ary hypothesis testing is a partially observable Markov decision problem 
(POMDP) where the state is static and observations are noisy. It is known that any POMDP is equivalent 
to an MDP with a compact yet uncountable state space, for which the belief of the decision maker about 
the underlying state becomes an information state [|35l . In our setup, thus, the information state at time 
t is nothing but the belief vector p(t) whose i* element is the conditional probability of hypothesis Hi 
to be true given the initial belief and all the observations and actions up to time t. Accordingly, the 
information state space is defined as P(0) := < p G [0, 1]^^ : X]i=iPi ~ ^( where 6 is the cr-algebra 
generated by random variable 9. In one sensing step, the evolution of the belief vector follows Bayes' 
rule and is given by $°, a measurable function from P(0) x Z to P(0) for all a E A: 



$"(p,z) 



q1{z) q^{z) qlj{z) 

Pi3r77T'P2— 7-7,---,PAf 



(2) 



^M 



Qpi^) Qpiz) Qpi^) \ 

where qp{z) = J2iLi PiQii^)^ ^^id $"(p, z) = p if qp{z) = 0. In other words, if p G P(6) is an apriori 
distribution, $"(p, 2;) gives us the posteriori distribution when sensing action a has been taken and z 
has been observed. Note that the posterior distribution is strongly dependent on the sensing action a. 
We define operator T"^, a E A, such that for any measurable function g : P(6) —^ M, 

{T'^g){p):= j g{^\p,z))ql{z)dz. (3) 

Given that p is an apriori distribution and action a has been taken, {T'^g){p) is the expected value of 
function g at the posterior belief, where the computation of the posterior belief follows Bayes rule as 
shown in ^. Note that using operator T", one can compute the mutual information between with 
distribution p and observation Z under action a with distribution g^, i.e., 

I{p-ql) := H{p) - j H{^\p,z))ql{z)dz 

= H{p)-{T'^H){p). 



Fact 1 (Consequence of Propositions 9.8 and 9.10 in [[361). Let V* : P(6) -)■ M+ be the minimal 
solution to the following fixed point equation: 



V*{p) = min M + min(T"\/*)(p),min(l - pj)L \ . (4) 

Then \^*(p(0)), referred to as the optimal value function, is equal to the minimum cost in Problem (P) 
with the prior belief p(0). 

Definition 1. A Markov stationary policy is a stochastic kernel from the information state space P(9) 
to ^ U {d} describing the conditional distribution on sensing actions A(t), t = 0, 1, ... ,r — 1 and 
stopping time r (the choice of declaration d marks the stopping time r). In other words, under policy 
TV, the probability that action a is selected at belief state p is given by 7r(a|p). 

Definition 2. A policy vr is referred to as Markov stationary deterministic if for each p E P(6), there 
exists an action a E AU {d} for which 7r(a|p) = 1. 

As shown in Corollary 9.12.1 in [|36l . equation (H) provides a characterization of an optimal Markov 

stationary deterministic policy tt* for Problem (P) as follows: Sensing action a* = argmin(T"V*)(p) 

aeA 

is the least costly sensing action, resulting in l + min(T"V*)(p), and is the optimal action to take unless 

aeA 

the penalty of wrongly declaring Hi*, where i* = argmin(l — Pj)L, is even less costly in which case 

jen 
it is optimal to retire and declare Hi* as the true hypothesis. 

Remark 1. It follows from dU that if min(l — pj)L < 1, then V*{p) = min(l — pj)L and hence, the 

further reduction of the probability of error is not worth taking one more sensing action. Therefore, the 

region of interest in our analysis is restricted to L > 1 and Pl(0) := {p G P(6) : min(l — pj)L > 1}. 

jen 

Before we close this section, we provide the following lemma. 

Lemma 1. Suppose there exists afunctional V : P(0) — )■ M+ such that for all belief vectors p E P(9) 

V{p) < min{l + min(T"\/)(p),min(l - pAL}. 

aeA jGSl 

Then V{p) < V*{p) for all p E P(6) where V* is a fixed point solution to dH). 
The proof is provided in Appendix IB 

IV. Performance Bounds 

As discussed earlier, finding an optimal policy n* for Problem (P) requires knowledge of the optimal 
value function V*. In lieu of numerical approximation of dU using value iteration techniques ll37l . or 



deriving a closed-form for V*, in Subsections IIV-AI and lIV-Bl we use Lemma [U and heuristic policies 
to find lower and upper bounds for the value function V* respectively. 
We have the following technical Assumptions. 



Assumption 1. For any two hypotheses i and j, i 7^ j, there exists an action a, a E A, such that 

D[qt\\q';)>Q. 

Assumption 2. There exists ^ < 00 such that max max sup 4^ < £• 

Assumption \T\ ensures the possibility of discrimination between any two hypotheses, hence ensur- 
ing Problem (P) has a meaningful solution. Assumption |2] implies that no two hypotheses are fully 
distinguishable using a single observation sample. Assumption [21 is a technical one for ease of our 
proofs. 



A. Lower Bounds for V* 

Proposition 1. Under AssumptionUl and for L > 1, V*{p) > Vj_(p) where, 



Mp) ■■- 



M 



log 



1-L- 



> Pi max — - — - — 

^ jj^i maxD{qf\\q'^ 

1 — J- nC— A '' 



lo£r£i 

'^ - K[ 



a€A 



and K[ is a constant independent of L. 



The proof of Proposition \\\ is based on Lemma \T\ and is provided in Appendix IH-Ai 
Next we provide another lower bound which is more appropriate for large values of M. Remember that 
I{p] q^) = H[p) — (T"if )(p) denotes the mutual information between 6 r^ p and observation Z under 



action a. Let Dr. 



maxmaxD(g"||g"), L, 



i,jefl a€A 



5 ^max 



max max lip-^qf), and a(L,M) :- 
aeA peP(e) '^ 



It can be easily shown that Dmax and Imax 
all M. Moreover, let 



are non-decreasing in M and I. 



max — ^max 



< log^ for 



M 



J-max ■ ^^ J-maxi 



Dr 



sup -Urnax ; 

M 

sup Irnax- 

M 



Proposition 2. Under Assumption [7] and for L > 1, 

^ ^ \H{p) - H{[a{L, M), 1 - a{L, M)]) - a{L, M) log(M - I) 



+ a{L,M)L 
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Furthermore, under Assumption |2] and for L > °^ and arbitrary S E (0, 0.5], 

-H{p)-H{[6,l-6])-S\og{M-l) 



V*ip) > VM :-- 



log 



l~L- 



log 



1-5 



Dr. 



" J-{maxpi<l-(5} ^2 



where K'2 is a constant independent of L and M. 



The proof of Proposition |2] is based on Lemma [U and is provided in Appendix III-B[ 
Proposition |2] can be used to show that when L < l^i-M, Problem (P) will have a trivial solution. The 
precise statement is given by the following corollary. 

Corollary 1. Let L < °^ , and suppose the decision maker has a uniform prior belief about the 
hypotheses. As M — )■ 00, the optimal policy randomly guesses the true hypothesis without collecting 
any observation and hence, Pe, the probability of making a wrong declaration, approaches 1. 



The proof of Corollary [T] is provided in Appendix HI-CI 

Corollary 2. For L > max{l, ^j^} and 6 = ^^^^, 

'h{p) , logi^^ 



V2{P) 



Dr 



{max Pi <l—-i — J^, ,, I 



O (log log ML) 



Remark 2. The lower bounds in Propositions \T\ and |2] can be explained by the following intuition: 
For any measure of uncertainty U{-) : P(0) — )■ M+, the number of samples required to reduce the 



uncertainty down to a target level f/( parget) has to be at least 



Anmx{C/) 



where Ainax(f^) is the 



maximum amount of reduction in U associated with a single sample, i.e., Amaxff^) = max max |f/(p) — 

aeA pGP(0) 

{T'^U){p)}. The lower bound in Proposition [H is associated with such a lower bound when taking U to 
be the log-likelihood function, while the lower bound in Proposition [2] is associated with setting U to 
be the Shannon entropy. 



Fact 2 (Proposition 3 in [EH). Under Assumptions\l\cmd\2\ and for L > 1, 

l-L-i 



V*{p) > 



M 



E« 



i=l 



log 



max log — 



o(M log L) 



max min ^ XaD{q^\\q^) ( max minmin V XaDiqfWq''))' 



where 



o(M log L) 
M log L 



-)■ a* ML -)> 00. 
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B. Upper Bounds for V* 

Next we propose two Markov policies tti and 1x2. Policies tti and 112 have two operational phases. 
Phase 1 is the phase in which the belief about all hypotheses is below a certain threshold; while in 
phase 2, the belief about one of the hypotheses has passed that threshold and actions are selected in 
favor of that particular hypothesis. The difference between the two policies is in the actions they take 
in each phase. 

Let fj,Q and t/q be vectors in A(^) such that 

fj,Q := arg max min min N Aa-D(q'"||q'?), 
AeA(^) i^^ i^» ^ 

77n := arffmaxmin min > XaD(qf\\} — ^Q'?)- 

For i E Cl, let fi- and rj^ be vectors in A(^) such that 

/Xj : = arg max min } Aa-D (gf 1 1 g") , 



r7j:=argmax min } XaDiq^Wy — q[ 



^' -^ 



Consider a threshold p, p E (|, 1 — L^^). Markov (randomized) policies tti and ^2 are defined as 
follow s|3 

• If Pi > 1 — L^^, retire and select Hi as the true hypothesis; 

. If p., e[p,l - L^^), then 

- 7fi(a|p) = pia Va G A; 

- 7f2(a|p) = Via Va G A; 

• If Pi G [0, p) for all i G r2, then 

- 7fi(a|p) = poa Va G A; 

- 7r2(a|p) = Voa Va G A 

Remark 3. Policies tti and 7f2 are equivalent for M = 2. 

Remark 4. The performance of policies yfi and tt2 depend highly on the problem at hand. Sections IV-AI 
and IV-BI will elaborate on this. 

^Policies TTi and tt2 are not unique; they each represent a class of parameterized policies. The tilde in tti and 7f2 is to emphasize the 
dependency of these policies on the threshold/parameter p. 
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Propositions |3] and |4] at the the end of this section provide two upper bounds Vi and V2 for the value 
function V*. For notational simplicity, let 

L := max minmin > \„D(qf\\q'i), 

\eA{A) ten 3+i ^-^ \^i\\^3'' 

/o := max min min > Aa-Dfofll > ^-^Q°i), 

D^^ := max min VAa£'(g,'^||g"), 
AeA(yl) ]^% ^ ■> 

aeA 

Dr, := max min V AaZ^(gf || V — ^g^). 

It can be easily shown that Ji, I2, -D^^, -Dr?,^ Vi G fi, are non-increasing in M and Ji < D^^ < -Dmaa; 
and /2 < -Dry < ^mai for all M. Note that Tables H] and |Il] provide respectively a list of the notations 
introduced in this section and their limiting values. 

Proposition 3. Under Assumptions \I} and ^ and for L > 1 and any p G Pl(G), 



- H{p) + \og{M -1) v-^ ^ ^^ fc^i ^Pfe oM + logL 
V'llp)- ^ + 2^ Pi n + 



is an upper bound for the optimal value function V* where ° . -r^j °^r -^ as ML — )■ cxd. 





log 


l-L-i 


+ 


h 




^., 







i j=l /^« 1 

M+log L 

The proof is done by analyzing the performance of tti and is provided in Appendix IIII-AI 
Proposition 4. Under Assumptions^ and^ and for L > 1 and any p G Pl(0), 

-'2 

is an upper bound for the optimal value function V* and K2 is a constant independent of L and M. 
The proof is done by analyzing the performance of 1x2 and is provided in Appendix IIII-B[ 

V. Applications and Technical Consequences of the Bounds 
In this section, we state and discuss the consequence of the bounds obtained in Section |lVl 

A. Order and Asymptotic Optimality 

The lower and upper bounds provided in Section |IV] can be applied to establish the order optimality 
and asymptotic optimality of the proposed policies as defined below. 
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TABLE I 

Summary of notations 



Notation 


Description 


J-max 


max max I{p: q%) 


J-^max 


max maxDfg."! q°) 


h 


max minmin Y^ A„Z)fof||g") 
xeMA) ien j^i ^^ v^* m'jj y 


h 


max min min Y, KD(q°:\\y^.,- -^q°;) 


D^. 


max min J2 ^aD{q^M) 
\eA(A) j^i aeA 


Dv, 


xgvf.,..?l?e)J/"^('^'ll^^'^--V.^^") 



Definition 3. Let V'^{p) denote the value function for policy vr, i.e., the expected total cost achieved 
by policy vr when the initial belief is p. For fixed M, policy vr is referred to as order optimal in L if 
for all p e P(e), 

Definition 4. For fixed M, policy tx is referred to as asymptotically optimal of order-1 in L if for all 

P G P(e), 

L^oo V^'i^p) 

Definition 5. For fixed M, policy vr is referred to as asymptotically optimal of order-2 in L if for all 
p E P(0), there exists a constant B independent of L such that 



V'^{p)-V*ip)<B. 

Remark 5. It is clear from the definitions above that order optimality is weaker than asymptotic 
optimality of order-1; while asymptotic optimality of order-2 is the strongest notion. The notion of 
asymptotic optimality of order- 1 was first introduced in [Q which naturally motivates the extension of 
higher orders. 

Next definition extends the notions of order and asymptotic optimality defined above to the case 
where M increases as well. 
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TABLE II 

Summary of limiting values 



Notation 


Description 


Notation 


Description 


— max 


inf Imax 

M 


J max 


sup Imax 
M 


D^a. 


inf Dmax 

M 


iJraax 


sup Dmax 

M 


L 


inf/i 

M 


Ti 


sup Ji 

M 


h 


mi I2 

M 


T2 


sup J2 

M 



Definition 6. Let p„ denote a uniform prior on the set of hypotheses. Policy tt is referred to as order 
optimal and asymptotically optimal of order-1 in L and M if respectively, 

y^(p ) _ v*{p ) Vip ) - V*(p ) 

lim ^7; , \ ^^"^ < 1, lim ^^"; , \ ^^"^ = 0. 

Next theorems establish order and asymptotic optimality of our proposed policies. 

Theorem 1. Policy tti is asymptotically optimal of order-1 in L. 

Proof: The proof simply follows from Definition |4l Fact |2l and Proposition [3l ■ 

Theorem 2. For L > -^ — and if In > 0, policy 7r2 is order optimal in L and M. 

J-max ^ 

Proof: The proof simply follows from Definition |6l Corollary |2] and Proposition |4l ■ 

Remark 6. Note that a sufficient condition for /2 > can be obtained by strengthening Assumption 1 
in the following manner: there exists C > such that for any hypothesis ifj, there exists an action a ^ A 
for which minggga D[((l\\q) > (^ where for all i G f2, a G A, Q% is the convex hull of distributions 

{(lji-)}jen-{i} on Z. 

Theorem 3. Policy 1^2 attains asymptotic optimality of order-2 in L if 

Furthermore, for L > -p — and if Imax = L2 '^^d Dmax = -0^, . , Vi G ^2, policy 1x2 is asymptotically 
optimal of order-1 in L and M. 

Proof: The proof of the first part follows from Definition |5] Proposition [H and Proposition |4] The 
proof of the second part follows from Definition |6l Proposition HI and Corollary |2] ■ 
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B. Information Acquisition Rate and Reliability 

In this section, we explain the primal (constrained) version of Problem (P), referred to as Problem (P'), 
and use the obtained bounds to extend the notions of achievable (communication) rate and error exponent 
to the context of active sequential hypothesis testing. The proofs of all the results in this section are 
provided in Appendix |Vl 

Problem (P') [Information Acquisition Problem] 

Consider a hypothesis testing problem with M hypotheses of interest, action space A, and obser- 
vation kemels {qti-)}ien.aeA- A Bayesian decision maker with prior belief p(0) is responsible to 
find the true hypothesis with the objective to 

minimize E [r] subject to Pe < e, (6) 

where r is the stopping time at which the decision maker retires, Pe is the probability of making 

a wrong declaration, and e > denotes the desired probability of error. 
Problem (P) can be viewed as a Lagrangian relaxation of Problem (P'). It is somewhat intuitive that 
as L — 7- oo the solution of Problem (P) is closely related to that of Problem (P') when e — )• 0. The 
following lemma makes this intuition precise. 

Lemma 2. Let E[r*] denote the minimum expected number of samples required to achieve Pe < e. We 
have 

E[r;]>(l-eL)(\/*(p(0))-l), (7) 

where V*{p{0)) is the optimal solution to Problem (P) for prior belief p(0) and penalty of wrong 
declaration L. 

Let W[t] and Pe'" denote respectively the expected stopping time (or equivalently the expected 
number of collected samples) and the probability of error under policy vr. Following the notations 
in ll38l . we define M'^(t, e) as the maximum number of hypotheses among which policy n can find the 
true hypothesis with W[t] < t and Pe'" < e. Policy vr is said to achieve information acquisition rate 
R > with reliability (also known as error exponent) .E > if 

lim-logM"(t,2-^*) = /2. (8) 

t—^oo t 

For fixed number of hypotheses, hence at information rate i? = 0, policy tt is said to achieve reliability 
E>0 if 

hm ^- log Pe''(t, M) = E, (9) 

i— >-oo t 
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where Pe'^(t, M) is the smallest probability of error that policy n can guarantee when looking for the 
true hypothesis among M hypotheses with E^[r] < t. 

The reliability function E{R) is defined as the maximum achievable error exponent at information 
rate R. Next we use the bounds obtained in Section |IV] as well as Lemma [2l to characterize upper 
and lower bounds on the maximum achievable information acquisition rate and the optimal reliability 
function. 

Before we proceed, we refer the reader to Tables U IHfor the list of notations introduced in Section HVl 
Also let Di and D2 denote respectively the harmonic mean of {D^.}i^n and {Dr,.}ien, i-C-, 



Moreover, let 



=1 '^»/ \i=i ^» 



D, =miDi, Do = miD2. 

M M 



Corollary 3. Suppose the hypotheses are equiprobable, i.e., P{{9 = i}) = jj, Wi E Q. No policy can 
achieve positive reliability E > at rates higher than Imax- Furthermore, 

(l-j^) Re{Ojma.) 

\ J max J 




(10) 

Remark 7. Corollary |3] establishes an upper bound, Imax^ on maximum achievable information acqui- 
sition rate. As shown in Appendix IV-C[ this can be strengthened to show that no policy can achieve 
diminishing error probability at rates higher than Imax- 

Corollary 4. For fixed M, hence at information rate R = 0, a policy vr can achieve the maximum 
reliability, i.e. E = Di, if and only if it is asymptotically optimal in L. 

Corollary |4] implies that for fixed M, hence at i? = 0, policies tti and n* achieve the optimal error 
exponent. 

Corollary 5. A policy n can achieve a non-zero rate R > with non-zero reliability E > only if it 
is order optimal in L and M. 

Corollary 6. Suppose the hypotheses are equiprobable, i.e., P{{0 = i}) = jg, Vi G Q. Policy 1x2 can 
achieve any rate i? G [0, Jg] with reliability E if 

£<a(l-|). (U) 
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Fig. \T\ summarizes the results above. The upper bound on the reliability function is shown in red. 
Policy TTi achieves the optimal reliability Di at R = with no provable guarantee for R > (this 
point is shown in green); while policy tt2 ensures an exponentially decaying error probability (the error 
exponent is shown in blue) for /2 G [0,/2). 






Ci^max 






Upper bound on E{R) 

Lower bound on E{R) 




\^ 





Rate 



Fig. 1. Lower and upper bounds on the optimal reliability function E{R). 



Remark 8. It can be shown that any optimal policy tt* for Problem (P) also achieves any rate R E [0, 1_2] 
with reliability E satisfying (fTTI) . 



C. Adaptivity Gain 

In this section, we first define a class of policies which do not fully utilize the observation outcomes. 
We then discuss the performance gap between these policies and the optimal one. 

Definition 7. A policy tx is referred to as open-loop or non-adaptive if under which, sensing actions are 
selected independent of the observation outcomes (hence independent of the belief state). For a given 
vector A G A(^), non-adaptive policy tix selects sensing actions a E A with probability Tix{a\p) = \a 
independent of p until the stopping time r is reached. 

Definition 8. Let Vx*(p) denote the value function for the best (randomized) non-adaptive policy. 
Adaptivity gain is defined as V\*{p) — V*{p), i.e., the increase in the expected total cost under the best 
non-adaptive policy relative to the optimal policy. 
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The advantage of non-adaptive policies is that they do not require knowledge of the observations and/or 
a careful reevaluation of the belief state when selecting sensing actions. The adaptivity gain defined 
above characterizes the loss in the performance due to non-adaptive selection of sensing actions. Next 
we show that for fixed M, the performance gap between the non-adaptive policy and yfi (hence the 
optimal one) grows at least logarithmically as the penalty L increases. 

Let V\{p) denote the expected total cost when the initial belief state is p and non-adaptive policy 
tta is enforced. The following proposition provides a lower bound for Vx{p). 

Proposition 5. Under As sumption {1} and for L > 1, Vx{p) is lower bounded by 

^^ log i^ - log ^ 



EPi max ,-_ ^ — — - — - — ^ — K'x 



jW j:>^aD{q!^M) "^ 



MP 

aeA 
where K'^ is independent of penalty L. 

The proof is very similar to the proof of Proposition [T] and is provided in Appendix ITVl 

Corollary 7. Unless there exists a A G A(^) such that, 



minVA„D(g»||g^)= max minVA,D(gf||g»), ^i E n, (12) 

aeA aeA 

the adaptivity gain grows at least logarithmically with L. 

Corollary |7]can be further simplified for the binary case (M = 2) as follows. 

Corollary 8. In the active binary hypothesis testing, the adaptivity gain grows logarithmically in L if 

argmax_D(g^||g2) 7^ argniaxD(g2l|q'i)- 

aGA aeA 

VL Examples 

In this section, we consider important special cases of the active hypothesis testing under which 
conditions of Theorem |3] hold, hence establishing the asymptotic optimality of 7r2. 

A. Binary Hypothesis Testing (M = 2) 

Consider Problem (P) for M = 2. In this setting, as noted in Remark |3l policies tti and iV2 are 
equivalent and by Theorem [U both policies are asymptotically optimal of order- 1 in L. Asymptotic 
optimality of order-2 of tti and n2 is also verified from Theorem |3] since equality (|5} holds trivially for 
M = 2. Furthermore, we obtain 

log iy^ - log £i log i^ - log ^ 

^P>~P^ nmxD{qf\\q^) ^' m8ixD{q^\\q'^) ^^^^^- 

aeA aeA 
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The problem of reliability (error exponent) associated with passive binary hypothesis testing with 
fixed-length (non-sequential) as well as variable-length (sequential) sample size has been studied by ll39ll - 
pT|. Recently, the authors in p2|. [|43l have generalized this problem for fixed-length and variable-length 
sample size respectively, to the active testing context. Our work complements the findings in [|43l by 
providing an asymptotic optimal solution in a total cost (and Bayesian) sense. 

B. Noisy Dynamic Search 

Consider the problem of sequentially searching for one and only object of interest in M locations 
where the goal is to find the object quickly and accurately. Let ^ = 2*^ be the set of all allowable 
combinations of locations that can be searched in one time slot. The outcome of the search is a random 
variable with probability density function /obj if the object is in the location(s) that are being searched; 
otherwise, it is distributed as /noise- 

The problem above can be modeled as an active hypothesis testing with action space A and the 
observation kernels 

/obi(-) if « e a 

g«(.)= } ^""J^^ , \fien,\faeA. 

[ /noise(-) Hi ^a 

Lemma 3. Consider the problem of noisy dynamic search explained above. Under the symmetry 
condition fobjiz) = fnoiseip — z) for some 6 e R, 

11 

i.2 yJohjW'^Johj T ~ J noise) ^maxi \'^^) 

min max D {qf M) = Dr,^ = D{fobj \ \ f noise) = D^ax , ^i e Q. (14) 

Lemma [3] together with Theorem |3] implies that policy 7r2 attains 1) asymptotic optimality of order-2 
in L; and 2) asymptotic optimality of order- 1 in L and M. 

Under the condition of Lemma |3l the schemes proposed in [|21, [[22l|. Il23l simplify to the one that 
searches, at each instant, a location with the highest probability of having the object. This scheme, 
which was also studied in p4| in a finite horizon context with symmetric observations and was shown 
to be optimal among policies that can only search a single location at a time, has an information 
acquisition rate that is restricted to zero; while at zero rate, it achieves asymptotic optimality and 
maximum error exponent Dmax- In contrast, in ll45l . [|46l . a generalization of binary search was proposed 
in which the locations are partitioned along the median of the posterior and, in effect, are searched 
along a generalized binary tree. It was shown in [|45l . [|46l that in the special case of Bernoulli noise 
where /obj = B{1 — p), and /noise = B{p), p G (0,0.5), the proposed policy can achieve any rate 



20 

R < /^(/obj|||/obj + I /noise) with reliability E = /^(/obj|||/obj + | /noise) - R- In other words, the policy 
was shown to be asymptotic optimal in M (since I_2 = I max) but only order optimal in L (since 

< -D(/obj||2/obj + 2 /noise) < -D ( /obj 1 1 /noise ) = Dmax)- 

Our proposed policy tt2 combines the best of the above two worlds: in its first phase, by randomly 
selecting actions from A, it ensures maximum acquisition rate -D(/obj|||/obj + |/noise) obtained by the 
generalized binary search of [|45l . [|46l : while its second phase coincides with the schemes in [[Si, ll22ll . 
[|23l ensuring the maximum feasible error exponent. 

C. Variable-Length Coding with Noiseless Feedback 

Consider the feedback communication system depicted in Fig. [21 Let Q = {1,2, . . . , M} denote the 
message set. We assume a sender wishes to communicate a discrete message 6* G f2 to the receiver with 
probability of error less than some e > over a noisy memoryless channel. The channel is described by 
finite input set X and output set y (possibly uncountable), and a collection of conditional probabilities 
P{Y\X). Let C denote the Shannon capacity of this channel and Ci the KL divergence between its 
two most distinguishable inputs: 

Ci = max D{P{Y\X = x)\\P{Y\X = x')). 

x,x'(^X 

The encoder receives a perfect knowledge about the decoder's past received signals through a noiseless 
causal feedback link. Using this knowledge, encoder decides what to transmit and when to terminate 
the transmission. The objective is to find encoding/stopping/decoding rules which achieve a desired 
probability of error, i.e., e, with minimum expected number of channel uses. To achieve this goal, the 
encoder sequentially and causally selects input sequence {Xt} until a stopping time r, and the decoder 
follows a decoding rule d : 3^^ — t- f2. The objective is to ensure that P{{9 ^ 9}) < e while E[r] is 
minimized, where 9 = diV^). 












Xt 


Channel 


Yt 


Decoder 
























yt-i 







Fig. 2. A noisy memoryless cliannel with a noiseless causal feedback link. 

Let 8 := {e(-) : Q — )■ X} be the set of all mappings from the set of messages Vl to X. In [|47ll . 
using the results from [|48l , we showed that without loss of generality a fictitious agent can be added to 
this communication system who has access to the past channel outputs and is responsible for selecting 
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actions from £ U {d}. The choice of decoding, i.e., action d, marks the termination of the transmission 
phase, the stopping time r, while the choice of encoding function cj at time t determines the input to 
the channel at time t, i.e., Xt = et{0). The addition of this fictitious agent does not change the nature 
of the problem. The reason is that the decision of the fictitious agent at any time t solely relies on y*^^ 
which is fully observable by both transmitter and receiver and hence are easily replicated at transmitter 
and receiver in isolate. 

From this point of view, the problem of variable-length coding with noiseless feedback is closely 
related to a special case of Problem (P') defined in Section IV-BI where a fictitious agent plays the role 
of the Bayesian decision maker whose K available actions coincide with £ {A = £ and K = \X\'^'^), 
and whose observation kernels are given by qi{y) = P{Y = y\X = e{i)). 

Lemma 4. For the problem of variable-length coding with feedback we have, 

1.2 ^ ^ ^ ^maxi K'^-^J 

12.2 = Ci = Dmax- (16) 

The proof is provided in Appendix |Vll 

Therefore, for the problem of variable-length coding with feedback with equiprobable messages, i.e., 

P{{e = i}) = J^, Vi G fi, we have 

-^ + -^ - O(loglog-) < E[r;] < -^ + -^ + 0(1)- (17) 

Furthermore, the optimal reliability function can be characterized as 

E{R) = Ci (^1 - ^^ , (18) 

and is achieved by policy 772. 

The problem above was first tackled by Bumashev in his seminal paper [jTll . In fact, policy 7r2 is 
nothing but a natural generalization of Bumashev's two-phase coding scheme. 

The lower bound in ([TTI) was proved in [flTl using a lengthy Martingale argument, and later was 
reproved in ||49l and [|38l . The proofs in [|49l and [|38l parallel the two-phase encoding scheme corre- 
sponding to the upper bound but provides slightly tighter lower bound (the double logarithmic term can 
be replaced by a constant when M is fixed). In this light. Proposition [2] provides yet another alternative 
proof for Bumashev's lower bound utilizing the MDP formulation dH). 

VII. Discussion and Future Work 

In this paper, we considered the problem of active sequential Af-ary hypothesis testing. Using a DP 
formulation, we characterized the optimal value function V* . Three lower bounds (complementary for 
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various values of the parameters of the problem) were obtained for the optimal value function V*. We 
also proposed two heuristic policies whose performance analysis resulted in two upper bounds for V*. 
Subsequently, we discussed important consequences of the bounds and established order and asymptotic 
optimality of the proposed policies under different scenarios. An important problem which remains is 
further improvement of the performance bounds. 

In this paper, we focused on sequential policies, i.e., policies whose sample size is not known initially 
and is dependent on the observation outcomes. There exist other types of policies in the literature. For 
example, non- sequential policies take a fixed number of samples (independent of observation outcomes) 
and make the final decision afterwards; while multi-stage policies (introduced in [|33l , [|34l ) can take 
a retire/declare action only at the end of each stage, and stages are not necessarily of the same size. 
Comparing the performance of sequential, non- sequential, and multi-stage policies in the context of 
active hypothesis testing is an area of future work. 

Appendix I 
Proof of Lemma [H 

To prove Lemma [B we have to slightly modify the state space and introduce new notations. We 
assume that after taking the retire/declare action, the system goes to the termination state, denoted by 
F, and remains in that state for the rest of the time. The state space is modified to iS = P(6) U {F} 
to include the termination state. For a E AU {d}, s E S, let 

1 if s = peP{e),aeA 

minjgn(l — pj)L if s = p e P(e),a = d • 
if s = F 

The Bayes operator is modified as follows: 



^%s,z) 



Using the notations above, the condition V{p) < min{l + min(T"l^)(p), min(l — Pj)L} is rewritten 



c" s 





(P, 


z) 


if s 


= pe 


p(e), 


a 


eA 


F 






if s 


= /5e 


p(e), 


a 


^d 


F 






if S 


= F 









as 



V{F) = 0, 
V{s)< min {c''{s)+mV(^''(s,Z))]},ysES-{F}. (19) 

aeAU{d} 
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Let 5'o, 5*1, 5*2, ... be a sequence of random variables denoting the belief states at times t = 0, 1, 2, . . . 
starting from belief state s, i.e., 

5*0 = s, 

5, = $^("-i)(5„_i,Z), \/n,n>0. 

Using (fT9l) iteratively for A^ times, we obtain 

V{s) < E"*[c^(°)(s)] +E'^*[V($^(°)(s,Z))] 

1 

<E-*[^c^(")(S'„)]+E-*[V(52)] 

n=0 
n=0 

where superscript n* implies that actions are selected according to an optimal policy n*. Taking the 

limit as A^ — 7- oo, we obtain 

( 1 °° 

V{s) < E"*[Vc^(")(5„)] +liminfE"*[\/(5^)] 

n.=0 

^Jy*(s) + liminfE"*[\/(5^)] 

A''— >-oo 

= V*{s) + liminf E-*[F(F)1|5^=^} + V{S^)1{s^^f}] 

N—>-oo 

= F*(s) +liminfE-*[\/(5^)l|5^^^}] 

N—^oo 

< y*(s) + Lliminf P" {Sn ^ F) 
^^ V*(sY 



where (a) follows from the monotone convergence theorem, (b) follows from the definition of V*, 
(c) follows from the fact that for any p G P(6), V{p) < min 
L > V*is) > E-*[r] = EZoP^'ir > n) = EZoP^^'iS^ ^ F). 



c) follows from the fact that for any p G P(6), V{p) < min(l — pj)L < L, and (d) holds since 
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Appendix II 
Proof of Propositions [Hand [2] and Corollary [H 

A. Proof of Proposition [7] 

Let r be the set of all mappings j : Vl —^ n such that 7(i) ^ i for i E Vl. Now associated with any 
7 G r define 



Vi^p) 



M log 1^- log ^ 



max D{qf\\q^, 



Next we use Lemma [U to show that V* > Vx* for all 7 G F. In particular, we show that for all 7 G F 
and all p G P(e), Vx^i^p) < min {l +min(T'^1/i^)(p), min(l - pj)L}. For any p such that Vi^{p) = 0, 
the inequality holds trivially. For Vj_^(p) > and for any action a G ^ we have 



M 



i=l 



> E / p^^^(< 



log 



1-L- 



log 






M 



v{'{p)-J2p 



max D{q^\\q^,..) 

^('?rll<(.)) 



dz-K[ 



-^ ma^D(gf||g«(,)) 



> V{<{p) - I. 



Claim 1 (in Appendix IVIII) . Constant K[ can be selected independent of L such that V{'{p) < min(l 
Pj)L is satisfied for all 7 G F. 

Using Claim [Hand letting Vi{-) = max^gr ^i^(")' we have the assertion of the proposition. 



M-l 



B. Proof of Proposition [2] 

Recall that I^ax = max max I(p: o?) and a(L, M) 
vectors p G P(6) , 

^ ^ r i/(p) - i/([a(L, M), 1 - a{L, M)]) - a(L, M) log(M - 1^ 

^max 

Note that the right-hand side of (l20l ) can be written as 

-H{p) - Hiiy) 



We first show that for all belief 



G(p) := 



1 + 



a{L,M)L 



where 



u 



a{L,M) a{L,M) 



M-l 



M-l 



,l-a{L,M) 



a{L,M)L 



(20) 
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Next we show that for all p E P(9), G{p) < min |l + min(T'^G')(p), min(l — Pj)L^. For any p such 
that G{p) = 0, the inequality holds trivially. For G{p) > and for any action a G ^ we have 



r Hi^^ip, z))qHz)dz - Hiv) 
(rG)(p) = ^- ^ ^^ ^i^"^ ' ^— + a(L, M)L 



Hip) - I{p- ql) - Hiiy) 



= G{p) 

> G{p) - 1, 






'-max 



a{L,M)L 



where the last inequality follows from the fact that /(p; g") < max max /(p; g?) = I-max- Therefore 



aeA pGP(e) 



G(p) < l + min(T"G)(p). 



a£A 



What remains is to show that G{p) < min(l — Pj)L. Rewriting G as 



A/-1 



E^=i P*log^ + (l-Ei=i P»)log ^_^4-i -H{u 



Eni —1 
i=l pi 



G{p)- 
we can compute the gradient at v. For alH = 1, 2, . . . , M — 1, 

— (i/) = log - - lege - log A/-1 + loge ) //„ 

CPi V Pi 1-Ei=i Pi 



a{L,M)L 



p=v 



( log — ] jlrnax 
\ Pi J 



log 



1 -aiL. 



:i(L,M) 



M)\ 






/In 



L. 



P='^ \ M-l 

Furthermore, G{v) = a{L, M)L = (1 — i^j\/)L. Without loss of generality and since both functions G{p) 
and min(l — pj)L are symmetric, let us focus on Pj\/(0) := {p G P(6) : p^f > p^, Vi G fi — {M}}. In 
this case, min(l — pj)L = (1 — Pm)L = J2i=i Pi^^ ^^^ hence, min(l — pAL is the tangent hyperplane 
to G{p) at u. This along with concavity of function G implies G{p) < min(l — Pj)L. Using LemmafU 
we have the assertion of the proposition. 
Next we need to show that 

-H{p)-H{[6,l-6])-6\og{M-l] 



v\p) > VM 



^max 



(21) 



log 



l-L- 



iog¥. 



^max '^^ 



{inaxpi<l— 5} 



K' 



We show this in two steps. First we consider the following concave function: 



J\p) :- 



■^ log 
l^Pi — 

4 = 1 



l-L- 



log ^ - log Y^ 



Dr 



"^ -K', 
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We use Jensen's inequality to show that 

Claim 2 (in Appendix |VlIl). For all p e F{Q), J\p) < 1 + min(TV)(p). 

Next we define J{p) = max{ J'(p), J"{p)} where J"{p) is the right-hand side of (|2TI) . i.e., 

'H{p)-H{[5,l-5])-5\og{M-l) , log i^- log i-^ 



J'\P) 



J- max J-^max '^^ 



{maxpi<l— 5} 



Ji^ 



Case 1: For all p such that J(p) = or J(p) = J'{p), it is trivial from Claim |2]that 

J(p) = J'(p) < 1 + min(TV)(p) < 1 + min(TV)(p). (22) 

Case 2: For all p such that J(p) = J"{p) > 0, and for any action a G ^, we have 

(TV)(p) = y"j($"(p,z))g^(^)rfz 

(a) J H{^\p, z))ql{z)dz - H{[6, 1 - 6]) -6\og{M - 1) 



J- max 



log i^ - log Y 



H T^ -'-{maxpi<l-<5} — -"-2 



r(p) '*"'* 



ai 



-'max' 
> /'(P) - 1 

= J{p) - 1, (23) 

where (a) follows from Claim |3] below and (6) holds since p is such that J{p) = J"{p). 

Claim 3 (in Appendix IVIII) . Let p be such that J{p) = J"{p) > 0. If Assumption [2] holds, then for all 
actions a E A and observations z E Z, 
jf^a...^ H{^''iP,^))-H{[S,l-6])-6\og{M-l) log^-log^ . ^, 

-'l^ IP5 ^JJ > f ^ 75 l{maxpi<l-5} - ^2- 

^rnax ^m.ax '^ 

(24) 
In other words, from (|22] ) and (|23T ). we have that 

J(p) < l + min(TV)(p). (25) 

a£A 

We also have 

Claim 4 (in Appendix IVIII) . For L > l^lM^ constant KL can be selected independent of L and M such 
that J(p) < min(l — pj)L is satisfied. 

Lemma [B together with (|25]) and Claim gl implies that V* > J = max{J', J"} > J" = V^. This is 
a slightly stronger result than (1211 . 
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C. Proof of Corollary [7] 

We first show that for all belief vectors w E P(6) for which z/j = 1 — a{L, M) for an i G fi and 
Vj = j^fj^ for all j E ^ — {i}, the optimal action is to retire and declare Hi as the true hypothesis, 
i.e., V*{iy) = L(l — I'i). Without loss of generality, consider i = M, hence 

'a{L,M) a{L,M) 



V 



-,l-a{L,M) 



M-1 ' ' M-1 

In the proof of Proposition |2l we have seen that V*{u) > G{v) = a{L, M)L = {1 — i'm)L. Furthermore, 
DP equation © implies that V*{u) < min(l - Uj)L = (1 - z/a/)L. Therefore, V*{i/) = (1 - iyM)L and 
the proof is complete. 

Remark 9. The technique used to prove Proposition |2] and Corollary [1] can be applied to further improve 
our proposed policies yfi and 7f2 by providing a better heuristic for the stopping criteria. More precisely, 
let TZi denote the collection of all belief vectors at which it is optimal to retire and declare Hi as the 
true hypothesis. According to Corollary 1 in [ JSO] . TZi, i G ^2 are convex. Since {p G P(6) : (1 — Pi)L > 
1} C 7^i (see Remark [B and {p G P(e) : pi = I - a{L, M),pj = ^7^ jV C 7^, (proved in this 
appendix), TZi can be estimated by the convex hull of the above sets. 

Appendix III 
Proof of Propositions [3] and |4] 

A. Proof of Proposition \3\ 

Recall that Pi{n) denotes the posterior belief about hypothesis Hi after n observations. Let r, Tj, 
i G r2, be Markov stopping times defined as follows: 



r : = min < n : min {1 — pj(n)} < L ^ > , 



(26) 



n := mm <^ n : mm —— > ^^ ,. 77 } ■ (27) 



From (dl), total cost under policy tti can be written as 

l-*nP)=E*Mr + min(l-p,(r))L] 
< E*i [t] + 1 

M 

<5^p,E*Hr,|0 = z] + l, (28) 

1=1 

where p = [pi, p2, • • • , Pa/] = [pi(0), p2(0), . . . Pm{0)] and the last inequality follows from the fact that 
T < Tj, V? G fi. For notational simplicity, superscript tti is dropped for the rest of the proof. 
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Next we find an upper bound for E['rj|^ = i], i E ^. Before we proceed, however, we introduce the 
following notations to facilitate the proof. Let 

T := log — -— log 



L- 



I-P 



TAp) := log — — minlog — . 

'^^' ^(l-p)/(M-l) k^^ ^pfc 



For any i > 0, we have 

oo 

= Y,P{{n>n}\e = , 

n=0 

(^T,ip) T* AogM logL 



J2 P{{r.>n}\9 = ^) 

M) 



"->(^ + ^ )(!+') 



< 



log(W)ferTy-nnnlog^ log i-4r^ - log ^ o(M + log L) 



log- 



L) 



M, 



/? 



log(M - 


-1) 


/i 




log(M - 


-1) 



niin log 



Pi 



1 -p) fc^i pfc/ \Ii D 



I 



<^.__ + 



log 7T^ — min log — 



h 



log 



Mi 



log — T-Ti min log — 



kf^i 



Pk 



D 



Mi 



o(M + log L) 



+ 



D 



^^ ^ o(M + log L) 



n 



< 



log(M-l) + log^ log 



l-L- 



min log -^ / 7, ^ , 1 r ^ 
fc^i ^ Pfe o(M + log L) 



h ■ i^M. ■ n 



(29) 



where inequality (a) follows from Lemma |5] below and by setting l = j- (log ML) *. Combining (1281) 
and (|29l) completes the proof of Proposition |3l 

Lemma 5. Given any l > and for n > ( -j^ + -^ j (1 + l), there exists B(l) and b(i) as shown 
in ^ such that P{{r, > n}\9 = i) < M5(t)e-''W". 

Proof of Lemma \5} 
Let Bij{n) and Bij(n) be events in the probability space defined as follows: 



Bij{n) 



B,,{n) 



log —r- < log 
log — ^^ < lo; 



L-V(A'/ - 1) 
P 



p,in) -(l_p)/(M-l) 
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We have, 





Pj{n) 
PiH 


Pi(^) 



Pjl 



n 



- E[log 


Pi{n) 


Pj{n) 


- E[log 


PiH 


Pj{n) 


- E[log 


Pz{n) 


Pj{n) 


- E[log 


PiH 


Pj{n) 



< log 

< log 

< lo£ 



P 



Pil'^j 



(l-p)/(M 
P 



-TT-E[l0g^] 

1 '■ Piin)-' 



Pj( 



e = i 



„_,V(M---4lo.fH-E.o.J^] 



^ = Z 



p 



(l-p)/(M 
<T;(p)-nJi}|^ = z 



— — min log — — nil ( I d 

1) ^=7^* Pk J 



Similarly, we can show that 

P{B,,{n)\e = i)< P({ log ^ - E[log 
where 






Pi{n') 
Tj = mm < n : mm — ^ — - > 



<T*-{n-n)D^^[\e = t 



W >n)-. 



j^^ p,{n') - (l-p)/(M-l) 
By construction (|27l) . 

P({r, >n}|e = z) 
<P(U,y,P,,H|^ = 2) 

< P(U,^.P.,H n {f. < ^ + n^}\e = ^) + P({f, >^^ + n^}\e 

ii i ~r L ii i + 6 



f,{p) , V2 



<5^ P(P.,Hn{r.<^ + n^}|^ 



5^ P(P.,(m)|0 



m:m> \ +n^^ 



Pi('^) 



,T,(p) , T* 1 + l/2^ 



E(^(Vos^-«i'o.^i<^.,(if^ + 7^-"i^)}l»-0 



+ 



pj[n) pj[n) 

Pi{m) 



E p(\ '». 



i, 1+1. 



Pj{m) 



n\ogP^% < Up) -mh\\e = t 



For any a,a E A and z,j G r2, we have 



log ^ - log % 



Ii 



(30) 



(31) 



(32) 



< 2 log^. For A; = 1, 2, . . . , n, let X^ 



log %^ and X = [Xi, X2, . . . , X„]. Define function /(X) = log g + ELi ^fc = log gfey- ^'^^^ ^^Z 



and Fact [3] below, and for n > ( -^f^ + -^ j (1 + i), we have 
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P{{n>n}\e = i) 



< (M - 1) 



exp I —2n 



2ioge 



1 + L/2-- 



1 T,{p) 



+ 



n \ h D^^ 



:i + o 



+ > exp -2m -^ — ^ 



m:m> ' +>^ti~ 



< (M-1^ 



< (M - 1) 



21oge 



1 + ,/2 - lZM(i + ,/2) 



m Ji 



exp — n— , 

^ ' 2 V 21og^ 



+ 



E 



exp I — ?n — 



iWh/{l + L/2) 



m:m> v +Wt^ 
-fi 1+1. 



21og^ 



exp I —n — 



r /'Axyo + o 

2 V 21oge 



,3 /^Ji/(l + ./2) 



l-exp(-f ( 



^ /^ /i/(lW2) 
2 log 5 



(33) 



Fact 3 (McDiannid's Inequality lISTIl ). Let X = [Xi, X2, . . . , Xn] be a family of independent random 
variables with Xk taking values in a set Xk for each k. Suppose a real-valued function f defined on 
H^^iXk satisfies |/(x) — /(x')| < Ck, whenever the vectors x and x' only differ in the k-th coordinate. 
Then for any u > 0, 

P(/(X) -E[/(X)] < -u) < e-^^'l^l^r<l. 



B. Proof of Proposition E] 

Recall that Pi{n) denotes the posterior belief about hypothesis Hi after n observations. Let r, Tj, 
i G fi, be Markov stopping times defined as follows: 

r := min < n : min {1 — Pj{n)} < L^^ > , (34) 

Ti := min |n : Pi{n) > 1 — L^^} ■ (35) 

From (dl), total cost under policy 112 is upper bounded as 

\/*Hp)=E*^[r + min(l-p,(r))L] 



< E*2 [t] + 1 



M 



<^p,E*2[ri|^ = z] + l, 



(36) 



i=l 
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where p = [pi, p2, • • • , Pm] = [pi(0), p2(0), . . . Pa/(0)] and the last inequality follows from the fact that 
T < Tj, Vi G VL. For notational simplicity, superscript 7f2 is dropped for the rest of the proof. 

Next we find an upper bound for W\Ti\6 = i], i & Vt. Let Un ■= log ^^ "^L —log yz^ and let Tn denote 
the history of previous actions and observations up to time n, i.e., J-'„ = cr{p(0), A(0), Z{0), . . . , A{n — 
1), Z{n — 1)}. Under policy 1x2, the sequence {t/„}, ra = 0, 1, . . ., forms a submartingale with respect 
to the filtration {Tn} with the following properties: 



(CI) \Un - Un-i\ < max max sup log ^iM < log^. 



ijGfl aG.4 262 



<(^) 



(C2) If t/„ > (p,(n) > p ^ P({A(n) = a}) = r/,J: 

= Y, PiiMn) = a})E [f/„H_i - f/„,| J-„, ^ = z, A(n) = a] 
= ^ r/iaE [f/„+i - UnlJ'n, 9 = i, A{n) = a] 



aeA 



^'7mE 



aeA 









qtiz) 



Pj{n) „ai 



1 - pi{n) 
-dz 



^3^i l-pdn)'ij 



> max min ^^ XaD(qf\\'S^ —^ — g^) 



aeA 



j^i 



D 



no 



(C3) If f/„ < and pj(n) < p for all j (^ P({A(n) = a}) = r/oa): 

= X r/oaK [f/„+i - Un\J^n. e = i,A{ 



aGA 



aGA 



Yvoa qt {z)\og 



qt[z) 



Pj(") ^at 



n) = a\ 



-dz 






> max min min >^ AoZ^fof 11 >^ — o'j') 



agy4 



j¥=i 



h. 
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Stopping time Tj can be rewritten as 

Ti = min |n : Pi{n) > 1 — -^~^} 

. p,{n) ^ 1-L-i 
mm < n : -— > 



mm <^ n : log - log ^ > log — -— log z \ ■ (37) 

' 1 - Pi{n) 1 -p L~^ 1 -p I 



The assertion of the proposition follows from (1371 ) and the following lemma which is a slight general- 
ization of Lemma 6 in [fTT|. 

Lemma 6. Assume that the sequence {Cn}> n = 0,1,... forms a submartingale with respect to a 
filtration {J-'„}. Furthermore, assume there exist positive constants Ki < K2 < -ft's such that 

E[Cn+l|-F„] >Cn + i^l if Cn. <0, 
E[Cn+l|^rJ>Cn + i^2 if Cn > 0, 
ICn+1 -Cn| < -ft's- 

Consider the stopping time Tb = min{n : Cn > -B}, B > Q. Then we have the inequality 



B-Co ^ ^ { I l\ ^ K. 



EM<^r^ + Col(Co<o}U^-^ +3 



In particular, from (C1)-(C3) and Lemma |6l we have 

Pi¥.[n\e = i]< Pi ^ + — ^ '- + 3 

logi^ , P.log^ , logT^(/^,.-/2)+3(logO^ 

= P* — ?r^ — + r^ + A 



^r,. /2 hD^^ 



iog_i^ paogig _^ 






where K'g = log i^-Dmaz + 3(log^)^ is independent of L and M. Now from (l36l) . (l38l) . and the fact 
that ^j=i Pi log ^—^ < H{p), we have the assertion of the proposition. 

Appendix IV 
Proof of Proposition [5] 

Vx is the solution to the following DP equation: 

Vx{p) = min 1 1 + 5^ A,(T«15.)(P), mm(l - p,)L I . 
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Let r be the set of all mappings 7 : ^2 — )> f2 such that 7(i) 7^ i for i E i^. Now associated with any 
7 G r define 



Vx^p) 



pi 






Next we use a slight modification of Lemma [U in which min(T'*y)(p) is replaced by Yl ^a{'^°'V){p) 
to show that Vx > Vx^ for all 7 e F. In particular, we show that for all 7 G F and all p G P(0), 
Vx^ip) < min{l + ^^g^A,(T'^V^'^)(p),min,en(l-Pj)i:}. For any p such that \\^{p) = 0, the 
inequality holds trivially. For Vx'^{p) > and for any action a G ^ we have 



aeA 



M „ log 1^- log "^"^^'^ 






M „ log.'^^''^^) 



> Vx\p) - E P- E ^ J ^"(^) V A m!^a^ ) ^" 

^=l aeA -^ _2./a^ Will 9^(0 ) 

a£A 

= Vx\p)-l. 

Claim 5 (in Appendix I VIII) . Constant K'^ can be selected independent of L such that V\'''(p) < 

min(l — pj)L is satisfied for all 7 G F. 

Using Claim [5] and letting Vx(-) = max Va'^(-), we have the assertion of the proposition. 

Appendix V 
Information Acquisition Rate and Reliability 

A. Proof of Lemma |2] 

Let tl := min{n : max pj{n) > 1 — L^^}. Then 

E[r:] > E[r;| maxp,(r;) > 1 - L-'] P({maxp,(r;) > 1 - L"^}) 

> E[<| max p,(r;) > 1 - L~'] (1 - E[l - maxp,(r;)]L) 

(b) 

> E[t:\ maxp,(r;) > 1 - L^'] (1 - eL) 

>E[r,.] (1-eL), (39) 
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where (a) follows from Markov inequality and (b) follows from the definition of r* which implies that 
Pe = E[l-maxp,(r*)]<e. 

Let Vl : P(6) — ;> IR+ be the minimal solution to the following fixed point equation: 

l^i(p)=min(l + min{(rVi)(p)},£(p)l, (40) 

where 

f if min(l - p,) < L'^ 
^{p) = { ^'^^ ■ (41) 

I oo otherwise 

It can be easily shown that 

nri] = VLipm > V*ipiO)) - 1. (42) 

Combining (l39l) and (|42|) completes the proof. 



B. Proof of Corollary \3\ 

Set e = 2-^\ L = --^, and 5 = -^. For L > M! and t > |t, we can use Lemma [21 and 

e log i log i Jmai! £• 

Proposition [2] to obtain the following lower bound 



^<^^^'-Wr 



g(p(0))-i,logM log^-logEt 

^max -'-^max *^'' 



For uniform prior p(0) = [1/M, . . . , 1/M], the lower bound simplifies to 



E r > 



:i - ^f logM ^ 1 i^t-21og£;t _ ^ 



^rnax ^'^ ^max 



Therefore, for any policy vr, 



:i-;|j)2logM-(t,2-^*) ^^ 1 ^Et-2\ogEt 



and hence. 



1-^) n " -0{l)<t, 



-, . t- (1- J_)^*z?^°s^* +0(1)_ 
lim -logM-(t,2-^*) < lim ^ ''' ''— ^/„^ax 



E I 2(l--^ )logi;t+l 

'max ' 

£-)-oo (1 — 



iiiJ^i T" ]^ \ 2 -'-max 



Et' 



Imax ( 1 - =^ 1 (43) 



'max 



^ -' max • \y^) 
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Inequality (|44|) implies that for i? > 0, no policy can achieve rates greater than I max- Furthermore, 
using (|43l) . the reliability function can be bounded as 



. Ir 

Next we show that for fixed M, hence at i? = 0, no policy can achieve reliability higher than Di. 
From Lemma |2] and Fact |2l and for uniform prior p(0) = [1/M, . . . , 1/M], we obtain the following 
lower bound 

E[r;]>(l-eL)(^-o(logL) 

Using the inequality above, we can find a lower bound on e such that ]E[r^*] < t is satisfied. More 
precisely, for any policy tx we obtain 

Pe-(t,M)>L-Ml- ,^^, \ 

We can select L = 2^i*+°(*) such that it satisfies 

1 



Pe^(t,M) 



< 2^i*+°(*)0(t), (45) 



and hence. 



lim — logPe''(t,M) < lim - log (2^^*+°W0(t)) 

t^oo t t— >oo t 

lim(D, + ^ + 0(^°^'' 



t— >oo \ t t 

= D,. (46) 

C. Generalizing the Result of Corollary \3\ 

The result of Corollary |3] can be strengthen to show that no policy can achieve diminishing error 
probability at rates higher than I max- Here we provide the sketch of the proof. 

Using ( |40l ) and a slight modification of Lemma [T] in which m.m.j^Q^{\ — pj)L is replaced by -C(p) (as 
defined in (|4TI)). we can find the following lower bound for Vl{p): 



Mp) 



H{p)-Hi[6,l-6])-6log{M-l) log 1^- log Y 

Dmar. '6" 



, '"O L-^ '"& (5 1 TV- 

H ;^ -'-{maxpi<l-5} — -f^ 



(47) 



where Ji^ = max { " ^"6i^'^-^;+^ ^ " iog,.»-x;^,os,^z ^ " iogv.»^x;-^x ^ _ 

J- max J-^max I ^max J-^max 



L-Mog(J\/-l)+l L-Mog(Af-l)+logg+2 ) ^ L-Mog(A/~l)+l _l logg+1 

J- max J-^n 

Combining ([391), (HI), and (gT]), we get 

E[r;]>(l-eL)l^(p(0)). 
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Let u{t) be a function such that u{t) — )> as t — )• oo but for any E > 0, ■u{t)2 — )• oo. In other 
words, — f^ — ;■ as t — )• oo. Set e = M(t), L = / , and 6 = \/u(t). We obtain the following lower 



bound 

E[r;] > (1 - v^) V W 6 

Therefore, for any policy tt. 



0(1) . (48) 



'l-?,^u{t) + 2u{t))\ogM-{tMt)) _ ^(^^ < ^ 



and hence. 



lim \ log ^^^(t, U{t))< Mm ^ ^i=^^ —J max = I ma.. (49) 

D. Proof of Corollary |?] 

By definition, policy tt is asymptotically optimal in L if V^'^(p) = E''[r] + LPe'" < ^^-^ + o(logL). 
This implies that policy tt can achieve Pe^ < L~^ (^^ + o(logL)') with E^[r] < ^ + o(logL). To 
satisfy E'^[r] < t, L can be selected as L = 2^i*(^~°(^)) where o(l) — ?> as t — t- oo. For this selection 
of L, Pe''(t, M) is bounded as 

Pe"(t,M) < 2-^i*(i-°(i))t, 

and by definition, the error exponent of policy vr is 

lim — logPe"(t,M) > lim — (log2-^^*(^-''(i» + logt) 

i— 7>oo t t— >oo t 

= limfDi(l-o(l))-^ 

= Di. (50) 

£■. Proof of Corollary \5\ 

We show that if policy n is not an order optimal solution to Problem (P), then it cannot achieve 
non-zero rate with non-zero reliability. 

Let E'^ [rj denote the expected number of samples that policy n requires to achieve Pe'^ < e. If policy 
71 is not order optimal in L and M, then E'^[re] is either 1) a;(log M) + 0(log ^) where i°^j^j — !■ oo as 
M — > oo; 2) O(logM) + a;(log ^); or 3) a;(logM) + cj(log ^). Proof is done by contradiction. Suppose 
E-[r,] = O(logM) + O(logi). Then 

y"(p) = E"[r] + LPe" < E"[r,] + Le^^^^, = O(logM) + O(logL), 
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which is an order optimal solution to Problem (P). 
. Case 1: E^[r,] = a;(logM) + O(logi). 

Setting e = 2"^* for some E > 0, we obtain from condition above that logM'^(t, 2"^*) = o(t). 
By definition, 

R= lim -logM^(t,2--^*) 

t^oo t 

= lim^ = 0. 

t— >oo t 

. Case 2: E^[r,] = O(logM) + w(logi). 

Setting M = 2^* for some i? > 0, we obtain from condition above that — logPe'^(t, 2^*) = o(t). 
By definition, 

E= lim ^logPe''(t,2^*) 

t— >oo t 

= lim ^^ = 0. 

t— >oo t 

. Case 3: E'^[r,] = u;(logM) + a;(log i). 
Proof follows similar lines as the proof of Case 1 and 2. 

F. Proof of Corollary |6| 

As shown in the proof of V2 in Appendix Hill policy 712 can achieve Pe'^^ < L^^ with E'^^ [r] < 

logM ^ logL o(ii_ For L = 2^* we obtain 



/2 ' D2 



t- M - oOi 
lim -logM*2(t,2-^*) > lim ^ ^2^/2 

^i.(l-|). (51) 

Therefore, for any rate R G [0,12], there exists a reliability E, E > D_2 [^ ~ Tr ^"'"^ ^^^ ^^ can 
achieve rate R with reliability E. 

G. Achievable Rate and Reliability Region for ir* 

By definition, V*{p) = E^*[r] + LPe^'. Thus, E^*[r] < V*{p) < V2{p) and Pe^* < L^^V*{p) < 

L~^V2{p)- Let L = t2^* and suppose the hypotheses are equiprobable. We obtain 

^_ logt2g' _ o(i)_ _ ^ 
lim -logM"*(t,2-^*) > lim ^ ^^ J2 

Therefore, for any rate R G [0,12], there exists a reliability .E, E > D2 il — f^) , such that vr* can 
achieve rate R with reliability E. 
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Appendix VI 
Proof of Lemma |4] 

Recall that Imax = max max I(p; q%) is the maximum (taken over the distribution of 9) mutual 
information between 9 and observation Y. Since 9 ^^ X —^ Y forms a Markov chain, we obtain from 
data processing inequality that Imax < max/(X; Y) = C, where Px denotes the distribution of X. 

Px 

Let PJ be the capacity achieving probability distribution defined on X. Let A* be such that for all 
e^E,\l = n^£iPJ(e(i)). We have, 

Pi 



Jo = max min min > \eD[q^A\y — of 

AeA(£) ien p6Pi(e) ^ ' " ' ^ 1 - p^ ^ 



> min min ^^ A*L)(g''|| ^^ - — ^gf) 

= ^Pi(x)z^(p(r|x = x)|| Y. Pxi^')PiY\x = ^')) 

where (a) follows from Jensen's inequality and the fact that under randomization A*, observation kernels 
q^ and g|, i ^ j, are independent from each other; and (6) follows from Theorem 4.5.1 in ll32l . 

From the discussion above, we obtain I max < C* < £2- Equality (fT5l) follows from Corollary |3] which 
implies that no policy can achieve rates higher than I max and hence, 1_2 — ^max- 

Next we prove equality (fT6] ). Let x,x' E X be two inputs of the channel satisfying D{P{Y\X = 
x)\\P{Y\X = x')) = Ci. We have, 

Dmax = max max D {q- 1 1 g^) 

= maxmaxD{P{Y\X = e{i))\\P{Y\X = e(j))) 

= D{P{Y\X = x)\\P{Y\X = x')) 

= Cr. (53) 

Let e* G £", z G r2, be such that e*{i) = x and for all j ^ i, e*{j) = x'. For all i E Q, we get 
max min > XeD(qf\\ > — qf) > min Diq'^' 1 1 > — - — q'^/ ) 

= D{P{Y\X = x)\\P{Y\X = x')) 
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and by definition, 



M 



D2 = m[J2 



1 



i=l 



max 



min J2 ^'eDiqfWYl r^.Qj) 



> 



"iU. 



Ci. (54) 



, i=l 



Combining (l53l) and (l54l) . we obtain Dmax = C*i < Z22- Equality (fT6l) follows from Corollary |3] which 
implies that no policy can achieve reliability higher than D^ax and hence, D_2 < Dmax- 



Appendix VII 
Proof of Technical Claims 



A. Proof of Claim [7] 



Let Drain = Hiin laax D {q°-\\q"-) . First we notice that, if p ^ Pl(0), then Vi is bounded as: 

i,j£fl aeA 



Vi{p) < 



M 



E 



Pi max 



log i^- log ^ 






^; 



ae^ 



< 



(a) 
< 



< 



logL + logj- 

Z^ A — ^ ^1 

{i£Q:pi<l-L-^} 



D^ 



{ien:p,<i-L-''}\ -> + 



log L + lofe ^^ 

El 2^{igf2;p.<l_L-l} A 

P 

{ien:p,<i~L-^} 
2 + L-i log(M - 1 



J-^ni it 



Dr 



-K[ 



(55) 



where (a) follows by Jensen's inequality; and {h) follows since L > 1, Yl,\i(^a-pi<i-L-n Pi < -^ ^ ior 
any p ^ Pl(0), and xlog ^ < 1 for x G [0, 1]. In other words, for K[ > '^+^-'^°s{m-i) ^ 



Vi{p) = < min(l - pj)L Vp ^ Pl(0). 



On the other hand, for all p G Pj:(6), we have 

2^ p* R ^M 



V^i(p) < 



i=l 



D 






log(L - 1) E^=l Pi lo^ 



Now let 



^niin 



M 



D 



K[ 



mm 



(56) 



fi{L,p):- 



log(L-l) Ei=iP*lo 



l-P^ 



^min ^min 

(a) log(L - 1) ^^=1 P^ ^°S ^- + (1 - 2^,=i P^j log (TZ^M^V) 



J-^m.ir. 



J-^m.ir, 



(57) 
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where (a) holds since y^,_i Pi = 1. Furthermore, let a = maxjl, 7^^— |, /3 = I (Dmin — ^^^), 
and L* = max{2a, |, ■^, l2ii^^:li}_ Next we show that for all L > L* and at belief vector p = 
[ (M-i)L ' • • • ' (M-i)L ' -'- ~ f]' function /i(L, ■) has an upper bound independent of L while its partial 
derivatives with respect to pj, i = 1, 2, . . . , M — 1 are less than L: 

_ log(L-l) , ^^=1 (1731)1 log ^iigz^ + (l-l) log TTf 

'P l(M-i)L^-^(M-l)L''- Li Umin J^min 

W^^(^^ log ^^^^ + log i# 



J-^m i/n J-J J-Jrrt 'in 



^7mn 



cJL-l) 



log^^^ , alog(M-l) + 21og 



L 



~ D D ■ 

_ logTzJEr ap + 2 

D D ■ 
W log(2a-l) + a/J + 2 
- D~ ' ^ ^ 

where (a) follows from the fact that xlog^ < 1 for x G [0, 1] and L > ]2^yMzll-^ and {h) holds since 

L>2a. And, 

n \^1 PI I a a -i a] 

Opi 'P-[(A/-1)L'---'(M-1)L'-^ lI 

1 - Pi log e 1 - Pa/ log e ^ 1 

log log ■ 



Pi I - Pi PM '^- PM J Drain P [(m-i)l '-' (m-i)l '^ l] 

(M — 1)L — a loge , a loge\ 1 

log s log ■ 



" -L (A/-1)L -tv « ^ i LJr 



< (log(M-l) + 21ogL + i^^L^ ^ 



a / D 



'mm 



L + ( log(M - 1) + 21ogL - ( A™n - ^1 i^^ ^ 



< L + (log(M - 1) + 21ogL - 3/3L) 
<L + (logL-/3L) ^ 



1 



£>, 



mm 



<L+(logL-log(/3L)2) 



< i^, (59) 

where (a) follows from the fact that logx^ < x for x > 4. 
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From symmetry and concavity of fi{L, ■), (l58l) . (l59l) . it is clear that for L > L*, and for all p G Pl(0), 

log(2a - 1) + «/3 + 2" ' 



/i(^,P) 



/^. 



< min(l - pj)L. 



(60) 



Fig. [3] shows this for M = 2. Furthermore, for all L < L*, 

h{L,p) < h{L*,p) < max/i(L*,p) < 



log L* + log M 

This together with (l55l) . d56l) . and d60l) implies the assertion of the claim for 

^^, 1 + a/3 + log L* + log M max{2 + L'^ log(M - 1), log L* + log M, log(2a - 1) + a/3 + 2} 
^1 = ^ > 



/^. 



Dr 



3.5- min{pi,l -pi}L 



/i(L,p)-0.56 




Fig. 3. Computing K[ for M = 2, L = 5, and Dmin ~ -75. In tliis example, the derivative of /i with respect to pi is equal to L 
at pi = 0.35 and K[ > 0.56 ensures that fi{L,p) - A'( < minjpi, 1 - pi}L. We have a = 2.67, /3 = 0.07, L* = 206.06, and 
K[ = 13.16. 



B. Proof of Claim |2] 

For any p such that J'{p) = 0, the inequality holds trivially. For any p such that J'{p) > and for 
any action a G ^ we have 



i=l 



PiQti^) 



M log i^ + log e - log y^ ^^ „ ( . 



£) 



^V) - E 



M /gf(z)log 



max 



Efc^. T^9?(-) 



-(i^; 



(a) 



j=l 

M 



/^. 



>^'(p)-E^ 



E.^.T^^(5rik.") 



i=l 



Dr 



> J'ip) - 1, 
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where (a) follows from Jensen's inequality. 



C. Proof of Claim \3\ 

For all p satisfying max pi> \ — 5, 

i^(p) < (1 -*) log ^ H- (M - 1) X ^ log ^^^ 
= H([S,l-S])+S\og{M-l), 

and hence, J" < 0. In other words, J(p) = J"{p) > implies that maxpi < 1 — 5. 

Let p = $"(p, z). Inequality (l24l) holds trivially if maxpj < 1 — 5 since J(p) > J"{p) and J"(p) is 
greater than or equal to the right-hand side of dH). If max pi > 1 — 6, we get 

J{p) = J'ip) 



■j^ log i^ + lege- log ^ 

2^ Pi 



(a) 
> 



■ M 



Dr 



pj- - K' 



log 



1-L- 



D. 



^2 



> 



log 



■ -1 



log^ 



£>. 



1^ 
^ -K' 
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where (a) follows from the fact that under Assumption |2] and for all i G ^2, 



log 



1 -Pi 



< 



< 



log 



pi 



1 -Pi 



log 



1-p.. 



log 



1 -Pl 



log ^''■^■''' -log " 



log 



Ej^iPjQji 
qf{z) 



<log^ + log 



1-6 



E,v. T^ci^i^) 



+ log 



1 -pi 
1-6 



log 



1-5 



(5 



6 



D. Proof of ClaimE 

Following similar lines as the proof of Claim [U we can select K2 sufficiently large such that J'{p) < 
minjgf7(l — Pj)L. Recall that J'{p) = [f2iL, p) — K'^'^ , where 

iog(L-i) + iog^ , E,t!ipaogi^ 



f2{L,p) 



Dr 



Dr 



(61) 
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By the assumption of Claim HI there exists k > such that logM < {Imax — k)L for all M. Set a = 

max{l, |}, /3 = i (k - ^), and let L'^ = max{2a, |, ^}. For L > L'^ and for alH = 1, 2, . . . , M- 1, 
we obtain 

log ^fer + loge a log(M - 1) + 2 log f 



.f2[L,p)\ r g 1 ai — F; 1- 



■-(M-l)!.'"'' (M-l)L' i' 



i-^mn.rr i-^ i-^ r^ 



'max 



^ log ^^^^ + log ^ a(/^a^. -k) + 2 

~ D D 

^ log(2a - 1) + log^ + g (/;„,,. - k) + 2 



and, 



^^'(L,p)|^ . ^ <flog(M-l) + 21ogL + i^L')-l 



dp: 



max 



< ( 21ogL + (/„„,. - K + i^^)L^ 



a ' D, 



'max 



<L + {\ogL-f3L) ^ 



D 



max 



< L. (63) 

From symmetry and concavity of /2(-Z^, ■)' <l62l) . (l63l) . it is clear that for L > L'2, and for all p G Pl(©), 



/2(i^,p) 



log(2a - 1) + lege + a{Ir,^ax - k) + 2'' ^ 



D 



max 



< min(l - pj)L. (64) 



Furthermore, for L < LL we have 



... W/rr' ^^ ^ , ^, ., ^ l0gL^2 + loge + logM ^ log4 + loge + L'^jlmax " ^) 

f2{L,p) < f2{L2,p) < m_ax/2(L2,p) < < . 

P J-'max J-^max 

In other words, selecting 

, max{log L'2 + log^ + L'^ilmax - k), log(2a - 1) + log^ + a{Imax - k) + 2} 
^2 > 7^ ) (65) 

^max 

satisfies J'(p) < minjgQ(l — Pj)L. 

Next we discuss on the selection of K2 such that J"{p) < minjgn(l — Pj)L is also satisfied. Let 

j^^^^^^ .^ H{p)-H{[5A-5])-5\og{M-l) ^ 

J- max 

and rewrite. 



J"{P) 



log i^^T^ - log i^ 

f?,{L, p) H l{maxft<l-5} - K2 

J-^max "='^ 
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We show that at belief vector p = [p^^, • • • , ^m-ul ' ^ ~ x]' ^'^^ f°^ -^ - ^2 '■= max{^^^, %^}, 
/^(L, p) has an upper bound independent of L and its partial derivatives with respect to pi, i = 
1, 2, . . . , M — 1 are less than L. In other words, 

Jal-^; P) L,_r 0.5 0.5 1 0.5] 

f'~L(jv/-l)L '■■■' (A/-1)L '-^ il 

^([2r'i-2i]) + 2iMM-i)-i/([5,i-5])-5iog(M-i) 



/, 



max 



and, 



^ ' 2 V "^'l^ '^J 



T^ 1-^1 Pj L_r 0.5 0.5 1 0.5] — 7 iOg L_r 0.5 0.5 

api ip"— L(A/_i)L'---'(A/-i)L'-^ L i imax Pi '^~4A/-l)L '•••' (A/-1)L ' 

log ■ ^ 



/, 



0.5 
(M-l)L 



<-^(log(M-l) + logL) 

-'max 
< -J {L{Imax - k) + logL) 



'-max 



L-(-^L-logL) 



'-max 



<L. 



Furthermore, for L < L'2, we have 



r , T \ ^ -e I Til \ ^ t ( Til \ ^ °S M L2 [Imax — k) 

h{L,p) < h[L2,p) < max/3(L2,p) < — < 

P -I- m.ax -I- max. 



P 

We also note that for all K2 satisfying (1651) . 



log 1^ - log 1^ 



l{maxp,<i-5} - K'2< J' [p] < niin(l - Pj)L. 



Dmax ^^^--^-"/ ^ - -^ - ^,0 

Thus, if K'2 satisfies (l65l) as well as the following condition. 



K2 > r , (66) 



we have 



.fT . , log^^-log¥ i ;.. 

/3l-t>, Pj H p; i{maxpi<l-5} " J^2 



'max 



< min(l - pj)L. 



By selecting K2 to be independent from L and M and larger than (|65T ) and (|66] ), we have the assertion 
of the claim. The following selection of K2 satisfies the above conditions: 

r.1 ) log ^2 + log^ + ^2(^max ' k) + 2 AI max , /Lx , 1 1 f.^. 

K2 = max { , — — + -— + ) . (67) 

i^max "^ "^ —max 
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E. Proof of Claim |5] 

Let Dx = mill ^ \aD{q^\\q'^). Here we bound Vx as follows: 



Vx{p) < 



Vx{p) < 



2 + L-Mog(M-l) ■ 



M log- ^-^ - log- -2^ 
log ^_i log ^^ 



vp^PL(e) 
, vpGPL(e). 



. 'i=i 

Following similar lines as the proof of Claim [T] we can show that 

^^, 1 + a/3 + logL* + logM max{2 + L-Mog(M - l),logL* + logM,log(2a - 1) + a/3 + 2} 
'"' = Dl ^ W, 

ensures that Vx_{p) < minjgn(l — Pj)L where a = max{l, -^}, /3 = |(Da — ^), and L* = 

max{2a, |, ^, M^::!)}. 
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